Feature Screening for Ultrahigh Dimensional Categorical Data with Applications.

نویسندگان

  • Danyang Huang
  • Runze Li
  • Hansheng Wang
چکیده

Ultrahigh dimensional data with both categorical responses and categorical covariates are frequently encountered in the analysis of big data, for which feature screening has become an indispensable statistical tool. We propose a Pearson chi-square based feature screening procedure for categorical response with ultrahigh dimensional categorical covariates. The proposed procedure can be directly applied for detection of important interaction effects. We further show that the proposed procedure possesses screening consistency property in the terminology of Fan and Lv (2008). We investigate the finite sample performance of the proposed procedure by Monte Carlo simulation studies, and illustrate the proposed method by two empirical datasets.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A selective overview of feature screening for ultrahigh-dimensional data.

High-dimensional data have frequently been collected in many scientific areas including genomewide association study, biomedical imaging, tomography, tumor classifications, and finance. Analysis of high-dimensional data poses many challenges for statisticians. Feature selection and variable selection are fundamental for high-dimensional data analysis. The sparsity principle, which assumes that ...

متن کامل

Model-Free Feature Screening for Ultrahigh Dimensional Discriminant Analysis.

This work is concerned with marginal sure independence feature screening for ultra-high dimensional discriminant analysis. The response variable is categorical in discriminant analysis. This enables us to use conditional distribution function to construct a new index for feature screening. In this paper, we propose a marginal feature screening procedure based on empirical conditional distributi...

متن کامل

Ultrahigh Dimensional Feature Screening via RKHS Embeddings

Feature screening is a key step in handling ultrahigh dimensional data sets that are ubiquitous in modern statistical problems. Over the last decade, convex relaxation based approaches (e.g., Lasso/sparse additive model) have been extensively developed and analyzed for feature selection in high dimensional regime. But in the ultrahigh dimensional regime, these approaches suffer from several pro...

متن کامل

Feature Selection for Varying Coefficient Models With Ultrahigh Dimensional Covariates.

This paper is concerned with feature screening and variable selection for varying coefficient models with ultrahigh dimensional covariates. We propose a new feature screening procedure for these models based on conditional correlation coefficient. We systematically study the theoretical properties of the proposed procedure, and establish their sure screening property and the ranking consistency...

متن کامل

Feature Screening in Ultrahigh Dimensional Cox's Model.

Survival data with ultrahigh dimensional covariates such as genetic markers have been collected in medical studies and other fields. In this work, we propose a feature screening procedure for the Cox model with ultrahigh dimensional covariates. The proposed procedure is distinguished from the existing sure independence screening (SIS) procedures (Fan, Feng and Wu, 2010, Zhao and Li, 2012) in th...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Journal of business & economic statistics : a publication of the American Statistical Association

دوره 32 2  شماره 

صفحات  -

تاریخ انتشار 2014